Goto

Collaborating Authors

 neural network language model




Testing learning hypotheses using neural networks by manipulating learning data

Leong, Cara Su-Yi, Linzen, Tal

arXiv.org Artificial Intelligence

Although passivization is productive in English, it is not completely general -- some exceptions exist (e.g. *One hour was lasted by the meeting). How do English speakers learn these exceptions to an otherwise general pattern? Using neural network language models as theories of acquisition, we explore the sources of indirect evidence that a learner can leverage to learn whether a verb can passivize. We first characterize English speakers' judgments of exceptions to the passive, confirming that speakers find some verbs more passivizable than others. We then show that a neural network language model can learn restrictions to the passive that are similar to those displayed by humans, suggesting that evidence for these exceptions is available in the linguistic input. We test the causal role of two hypotheses for how the language model learns these restrictions by training models on modified training corpora, which we create by altering the existing training corpora to remove features of the input implicated by each hypothesis. We find that while the frequency with which a verb appears in the passive significantly affects its passivizability, the semantics of the verb does not. This study highlight the utility of altering a language model's training data for answering questions where complete control over a learner's input is vital.


Tuor

AAAI Conferences

Automated analysis methods are crucial aids for monitoring and defending a network to protect the sensitive or confidential data it hosts. This work introduces a flexible, powerful, and unsupervised approach to detecting anomalous behavior in computer and network logs; one that largely eliminates domain-dependent feature engineering employed by existing methods. By treating system logs as threads of interleaved sentences'' (event log lines) to train online unsupervised neural network language models, our approach provides an adaptive model of normal network behavior. We compare the effectiveness of both standard and bidirectional recurrent neural network language models at detecting malicious activity within network log data. Extending these models, we introduce a tiered recurrent architecture, which provides context by modeling sequences of users' actions over time. Compared to Isolation Forest and Principal Components Analysis, two popular anomaly detection algorithms, we observe superior performance on the Los Alamos National Laboratory Cyber Security dataset. For log-line-level red team detection, our best performing character-based model provides test set area under the receiver operator characteristic curve of 0.98, demonstrating the strong fine-grained anomaly detection performance of this approach on open vocabulary logging sources.


What Does It Mean for AI to Understand?

#artificialintelligence

Remember IBM's Watson, the AI Jeopardy! A 2010 promotion proclaimed, "Watson understands natural language with all its ambiguity and complexity." However, as we saw when Watson subsequently failed spectacularly in its quest to "revolutionize medicine with artificial intelligence," a veneer of linguistic facility is not the same as actually comprehending human language. Natural language understanding has long been a major goal of AI research. At first, researchers tried to manually program everything a machine would need to make sense of news stories, fiction or anything else humans might write.


What changes OpenAI's GPT-3 and other models brought to us

#artificialintelligence

In June last year, GPT-3 released by OpenAI, it is composed of 175 billion parameters, and training cost tens of millions of dollars, it was the largest artificial intelligence language model ever produced. From answering the questions to writing articles and poems, and even writing slang language everything is covered. The full name of GPT-3 is Generative Pretrained Transformer-3 (Generative Pretrained Transformer-3). This is the third series of generating pretraining converters, which is more than 100 times that of GPT-2 in 2019. In GPT-3 there are 175 billion parameters, the second largest language model has 17 billion parameters.


LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

Beck, Eugen, Zhou, Wei, Schlüter, Ralf, Ney, Hermann

arXiv.org Machine Learning

LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of one-pass decoding and lattice rescoring. We perform decoding with the LSTM-LM in the first pass but recombine hypothesis that share the last two words, afterwards we rescore the resulting lattice. We run our systems on GPGPU equipped machines and are able to produce competitive results on the Hub5'00 and Librispeech evaluation corpora with a runtime better than real-time. In addition we shortly investigate the possibility to carry out the full sum over all state-sequences belonging to a given word-hypothesis during decoding without recombination.


IBM Sets New Transcription Performance Milestone on Automatic Broadcast News Captioning

#artificialintelligence

Two years ago IBM set new performance records on conversational telephone speech (CTS) transcription, by benchmarking its deep neural network based speech recognition systems on the Switchboard and Callhome corpora, two popular publicly available data sets for automatic speech recognition [1]. Here we show that this impressive performance holds on other audio genres. Similar to the CTS benchmarks, the industry has for many years evaluated system performances on multimedia audio signals with broadcast news (BN) captioning. We have now achieved a new industry record of 6.5% and 5.9% on two BN benchmarks: RT04 and DEV04F [2]. Both these test sets have been released in the past by the Linguistic Data Consortium (LDC) [3].


Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

Kumar, Shankar, Nirschl, Michael, Holtmann-Rice, Daniel, Liao, Hank, Suresh, Ananda Theertha, Yu, Felix

arXiv.org Machine Learning

ABSTRACT Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks. However, these models are computationally more expensive than N-gram LMs for decoding, and thus, challenging to integrate into speech recognizers. Recent research has proposed the use of lattice-rescoring algorithms using RNNLMs and LSTMLMs as an efficient strategy to integrate these models into a speech recognition system. In this paper, we evaluate existing lattice rescoring algorithms along with new variants on a Y ouTube speech recognition task. Lattice rescoring using LSTMLMs reduces the word error rate (WER) for this task by 8% relative to the WER obtained using an N-gram LM. Index Terms-- LSTM, language modeling, lattice rescoring, speech recognition 1. INTRODUCTION A language model (LM) is a crucial component of a statistical speech recognition system [1]. While this makes the N-gram LMs powerful for tasks such as voice-search where short-range contexts suffice, they do not perform as well at tasks such as transcription of long form speech content, that require modeling of long-range contexts [2].


Sentences with style and topic

#artificialintelligence

In this week's post we will have a closer look at a paper dealing with the modeling of style, topic and high-level syntactic structures in language models by introducing global distributed latent representations. In particular, the variational autoencoder seems to be a promising candidate for pushing generative language models forwards and including global features. Recurrent neural network language models are known to be capable of modeling complex distributions over sequences. However, their architecture limits them to modeling local statistics over sequences and therefore global features have to be captured otherwise. Non-generative language models include the standard recurrent neural network language model, which predicts words depending on previous seen words and does not learn a global vector representation of the sequence at any time.